Lloyd's algorithm

Example of Lloyd's algorithm. The Voronoi diagram of the current points at each iteration is shown. The plus signs denote the centroids of the Voronoi cells.

First iteration
Second iteration
Third iteration
Fifteenth iteration
In the last image, the points are very near the centroids of the Voronoi cells. A centroidal Voronoi tessellation has been found.

In computer science and electrical engineering, Lloyd's algorithm, also known as Voronoi iteration or relaxation, is an algorithm for grouping data points into a given number of categories, used for k-means clustering.

Lloyd's algorithm is usually used in a Euclidean space, so the distance function serves as a measure of similarity between points, and averaging of each dimension for the averaging, but this need not be the case.

Lloyd's algorithm starts by partitioning the input points into k initial sets, either at random or using some heuristic. It then calculates the average point, or centroid, of each set via some metric (usually averaging dimensions in Euclidean space). It constructs a new partition by associating each point with the closest centroid, usually using the Euclidean distance function. Then the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed).

More formally:

Lloyd's algorithm starts with an initial distribution of samples or points and consists of repeatedly executing one relaxation step:

Each time a relaxation step is performed, the points are left in a slightly more even distribution: closely spaced points move further apart, and widely spaced points move closer together. In one dimension, this algorithm has been shown to converge to a centroidal Voronoi diagram, also named a centroidal Voronoi tessellation (Du, Emelianenko & Ju 2006). In higher dimensions, some slightly weaker convergence results are known (Sabin 1986), (Emelianenko, Ju & Rand 2009).

Since the algorithm converges slowly, and, due to limitations in numerical precision the algorithm will often not converge, real-world applications of Lloyd's algorithm stop once the distribution is "good enough." One common termination criterion is when the maximum distance a point moves in one iteration is below some set limit.

Lloyd's method is used in computer graphics because the resulting distribution has blue noise characteristics (see also Colors of noise), meaning there are few low-frequency components that could be interpreted as artifacts. It is particularly well-suited to picking sample positions for dithering.

Lloyd's algorithm is also used to generate dot drawings in the style of stippling (Deussen et al. 2000). In this application, the centroids can be weighted based on a reference image (Secord 2002) to produce stipple illustrations matching an input image.

See also

References